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rH^um The title of the application as originally filed reads as follows: 
"New sequences of hepatitis C virus genotypes and their use as 
therapeutic and diagnostic agents". 



23. AUG. 2004 9=53 INNOGENETICS NV 32 9 2410966 NR. 705 



NEW SEQUENCES OF HEPATITIS C VIRUS GENOTYPES AND THEIR 
USE AS THERAPEUTIC AND DIAGNOSTIC AGENTS 

The invention relates to new sequences of hepatitis C virus genotypes 
and their use as therapeutic and diagnostic agents. 

The present invention relates to new nucleotide and amino acid 
sequences corresponding to type-specific regions of Hepatitis C vims type 3 and 
the coding region of Hepatitis C virus type 4, a process for preparing them, and 
their use for diagnosis, prophylaxis and therapy. 

The technical problem underlying the present invention is to provide 
new type-specific sequences of the Core, the El, the NS3, the NS4 and the NS5 
regions of HCV type 3 and type 4. These new HCV sequences are useful to 
diagnose the presence of type 3 and/or type 4 HCV genotypes present in a 
biological sample. Moreover, the availability of these new type-specific 
sequences can increase the overall sensitivity of HCV detection and should also 
prove to be useful for therapeutic purposes. 

Hepatitis C viruses (HCV) have been found to be the major cause of 
non-A, non-B hepatitis. The sequences of cDNA clones covering the complete 
genome of several prototype isolates have already been determined (Kato et aL, 
1990; Choo et al.> 1991; Okamoto et aL, 1991; Okamoto et aL, 1992), 
Comparison of these isolates shows that the variability in nucleotide sequences 
can be used to distinguish at least 2 different genotypes, type 1 (HCV-1 and 
HCV-J) and type 2 (HC-J6 and HC J8), with an average homology of about 
68%, Within each type, at least two subtypes exist (e,g, represented by HCV-1 
and HCV-J), having an average homology of about 79%. HCV genomes 
belonging to the same subtype show average homologies of more than 90% 
(Okamoto et aL, 1992). However, the partial nucleotide sequence of the NS5 
region of the HCV-T isolates showed at most 67% homology with the 
previously published sequences, indicating the existence of a new type (Mori et 
aL, 1992). Parts of the 5' untranslated region (UR) P core, NS3, and NS5 
regions of this type 3 have been published, further establishing the similar 
evolutionary distances between the 3 major genotypes and their subtypes (Chan 
etal.,1992). 

The identification of type 3 genotypes in clinical samples can be 
achieved by means of PCR with type-specific primers for the NSS region. 
However, the degree to which this will be successful is largely dependent on 
sequence variability and on the virus titer preseni in the serum. Therefore, 
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The LiPA format is completely compatible with commercially available 
scanning devices, thus rendering automatic interpretation of the results very 
reliable. All those advantages make the LiPA format liable for the use of HCV 
detection in a routine setting. The LiPA format should be particularly 
advantageous for detecting the presence of different HCV genotypes, 

The present invention also relates to a method for detecting and 
identifying novel HCV genotypes, different from the known HCV genomes, 
comprising the steps of: 

- determining to which HCV genotype the nucleotides present in a 
biological sample belong, according to the process as defined above, 

- in the case of observing a sample which does not generate a 
hybridization pattern compatible wirh those defined in Table 3, sequencing the 
portion of the HCV genome sequence corresponding to the aberrantly 
hybridizing probe of the new HCV genotype to be determined. 

The present invention also relates to the use of a composition as defined 
above, for detecting one or more genotypes of HCV present in a biological 
sample liable to contain them, comprising the steps of: 

(i) possibly extracting sample nucleic acid, 

(ii) amplifying the nucleic acid with at least one of the primers as defined 
above, 

(iii) sequencing the amplified products 

(iv) inferring the HCV genotypes present from the determined sequences 
by comparison to all known HCV sequences. 

The present invention also relates to a composition consisting of or 
comprising at least one peptide or polypeptide comprising a contiguous 
sequence of at least 5 amino acids corresponding to an amino acid sequence 
encoded by at least one of the HCV genomic regions as defined above, having 
at least one amino acid differing from the corresponding region of HCV type 1 
and/or type 2 polyprotein sequences, or muteins thereof. 

The new type 3 amino acid sequences, as deduced from the disclosed 
nucleotide sequences (see SEQ ID NO 1 to 42), show homologies of only 59.9 
to 78% with prototype sequences of type 1 and 2 for the NS4 region, and of 
only 53.9 to 68.8% with prototype sequences of type 1 and 2 for the El region. 
As the NS4 region is known to contain several epitopes, for example 
characterized in patent application EP-A-0 489 968, and as the El protein is 
expected to be subject to immune attack as part of the viral envelope and 
expected to contain epitopes, the NS4 and El epitope? of the new type 3 and 4 
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vol. 15-1 et H. THIEME, Swttgart 1974. 

The polypeptides of the invention can also be prepared in solid phase 
according to the methods described by Atherton and Shepard in their book 
entitled 'Solid phase peptide synthesis" (IRL Press, Oxford, 1989). 

The polypeptides according to this invention can be prepared by means 
of recombinant DNA techniques as described by Maniatis et al., Molecular 
Cloning; A Laboratory Manual, New York, Cojd Spring Harbor Laboratory, 
1982). 

The present invention relates more particularly to a composition as 
defined above, with said polypeptide or peptide havii*g at least Qne of the 
following amino acids in its peptidic chain; 
f - A1S7, F182, 1186, H187. A190, S191 or G191, L192, W194, V202, 

L203. V219, 1227, Q231, T237 or A237, T240, Y250, T254 F S260, M271, 
M280, Q299, T303, L308, and L313 for the Core/El region, and D1556, 
Q1579, L1581. S1584. F1S85, E1606, V1612. P1630, C1636, T1656, L1663, 
H1685, E1687, G1689, Y1705, A1714, A172I, V1723, H1726, R1738, 
Q1743, A1744, E1747, 11749, A1751. A1759 and H1762 for rhe NS3/NS4 
region, as detected in type 3 sequences of the present invention, 

- M44, Q70, A87, N106, K115, G142, 1144, 1178, P193, Y194, A197, 
M231, T232, V235, 1242, S247, P249, S250, L251. V254, P257, A261, Y264, 
A266, G268, A280, L284, Y293, Q297. A299, and N303 in the Core/El 
region, and H1310 F V1312, Q1321, P1368, V1372. N1399, F1648, P1651, 
V1667, T1669, A1681, A1700, Q1704, A1713, S1714, M1718, D1719, T1721, 
R1722, A1723, G1726, F1735, 11736, S1737, T1739, G1740, K1742, T1745, 

# L1746, K1747, A1750, V1753, N1755, A1757, D1758, T1763. and Y1764 for 

the NS3/NS4 region, as detected in type 4 sequences of the present invention, 

- D217, A213, A256, R294, V1677, Q1704, E1730, V1732, Q1741 
and T1751 for the NS3/NS4 regions, as detected in type 3 and 4 sequences of 
the present invention, and with said notation being composed of a letter, 
unambiguously representing the amino acid by its one-letter code, and a number 
representing the amino acid numbering according to Kato et al. v 1990 (see also 
Table 1 for comparison with other isolates). 

For example M231 refers to a methionine at posidon 231. A glutamine 
(Q) is present at the same position 231 in type 3 isolates, whereas this position 
is occupied by an arginine in type 1 isolates and by a lysine (K) or asparagine 
(N) in type 2 isolates (see Figure 1A), 

The peptide or polypeptide according to this embodiment of the 
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acids Q299 and T303 are unique for type 3 isolates. The type 4 isolate shows 
the following unique V5 sequence: RPRQHATVQN (SEQ ID NO 92), of which 
Q297, A299, and N303 are unique for type 4. Amino acid R294 is unique for 
type 3 and 4 isolates. 

Consequently, the present invention also relates to a composition as 
defined above, wherein said peptides or polypeptides contain in their peptidic 
chain an amino acid sequence ^elected from any of the regions spanning the 
following positions of HCV type 3 polyproteins: 

- positions 140 to 319 in the Core/El region, more particularly a 
composition wherein said polypeptide or peptide corresponds to a sequence 
within any of the amino acid sequences as represented in SEQ ID NO 14, 16, 
18, 20, 22, 24, 26 or 28, or any other HCV amino acid sequence having a 
homology of more than 69%, preferably more than 70%, and most preferably 
more than 72% in the El region spanning positions 192 to 319 to any of the 
3Tninn acid sequences as represented in SEQ ID NO 14, 16, 18, 20, 22, 24, 26 
or 28, preferably a composition containing at least one of the following 
polypeptides: 

LEWRNTSGLYVL (SEQ ID NO 83), VYEADDVELHA (SEQ ID NO 
85), VQDGNTST (SEQ ID NO 94), VQDGNTSA (SEQ ID NO 95), 
VQDGNTSTCWTPV (SEQ ED NO 87), VKYVGATTAS (SEQ ID NO 96), 
VRYVGATTAS (SEQ ID NO 89), RPRRHQTVQT (SEQ ID NO 91), or any 
synthetic peptide or polypeptide containing at least 5 contiguous amino acids 
derived from the above-defined peptides in their peptidic chain. 

- positions 1646 to 1764 in the NS3/NS4 region, more particularly a 
composition wherein said peptide or polypeptide corresponds to a sequence 
within any of the amino acid sequences as represented in SEQ ID NO 30, 32, 
34, 36, 38 or 40, or any other HCV amino acid sequence having a homology of 
more than 76%, preferably more than 78%, most preferably more than 80% to 
any of the amino acid sequences as represented in SEQ ID NO 30, 32, 34, 36, 
38 or 40, in die region spanning positions 1646 to 1764, preferably a 
composition containing at least one of the following polypeptides: 
LGGKPAIVPDKEVLYQQYDE (SEQ ID NO 97), 
LGGKPALVPDKEVLYQQYDE (SEQ ID NO 98), 

SQ AAP YIEQ AQVIAHQFKEK (SEQ ID NO 99), 
IAHQFKEKVLGLLQRATQQQ (SEQ ID NO 100), 
IAHQFKEKCLGLLQRATQQQ (SEQ ID NO 101), 

or any synthetic peptide or polypeptide containing at least 5 contiguous 
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(iii) ANTI- SENSE: NO 

(vii) IMMEDIATE SOURCE; 

(B) CLONE : BR36-20-164 

( lx) FEATURE i 

(A) NAME/KEY: CDS 
<B) LOCATION: 3.. 401 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 35: 

TC CAA AAT GAA ATC TGC TTG ACA CAC CCC ATC ACA AAA TAG ATC ATG 47 
Gin Asn Glu He cya Leu Thr His Pro He Thr Lys Tyr He Met 
15 10 15 

GCA TGC Alt? TCA GCT GAT CTG GAA GTA ACC ACC AGC ACC TGG GTT TTG 9S 
Ala Cys Met Ser Ala Asp Leu Glu Val Thr Thr Ser Thr Trp Val Leu 
20 25 30 

CTT GGA GGG GTC CTC GCG GCC CTA GCG GCC TAC TGC TTG TCA GTC GGT 143 
Leu Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Val Gly 
35 40 45 

TGT GTT GTG ATT GTG GGT CAT ATC GAG CTG GGG GGC AAG CCG GCA ATC 191 
Cys Val val He v*l Gly His He Glu Leu Gly Gly Lys Pro Ala He 
SO 55 60 

GTT CCA GAC AAA GAG GTG TTG TAT CAA CAA TAC GAT GAG ATG GAA GAG 239 
Val Pro Asp Lye Glu Val Leu Tyr Gin Gin Tyr Asp Glu Met Glu Glu 
65 70 75 

TGC TCA CAA GOT GCC CCA TAT ATC GAA CAA GCT CAG GTA ATA GCT CAC 287 
Cys Ser Gin Ala Ala Pro Tyr He Glu Gin Ala Gin Val He Ala His 
80 8S 90 95 

CAG TTC AAG GGA AAA GTC CTT GGA TTG CTG CAG CGA GCC ACC CAA CAA 33S 
Gin Phe Lys Gly Lys Val Leu Gly Leu Leu Gin Arg Ala Thr Gin Gin 
100 105 HO 

CAA GCT GTC ATT GAG CCC ATA GTA ACT ACC AAC TGG CAA AAG CTT GAG 383 
Gin Ala Val He Glu Pro He Val Thr Thr Asa Trp Gin Lys Leu Glu 
115 120 125 

GCC TTT TGG CAC AAG CAT 401 
Ala Phe Trp His Lys His 
130 



(2) INFORMATION FOR SEQ ID NO; 36; 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 133 amino acids 
(£) TYPE: amino acid , 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
Gin Acn Glu He Cys Leu Thr Hla Pro He Thr Lya Tyr He Me'c Ala 
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15 10 15 

Cys Mer Ser Ala Asp Leu Glu Val Thr Thr Ser Thr Trp Val Leu Leu 
20 25 30 

Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Val Gly Cys 
35 40 45 

Val Val lie Val Gly Kis He Glu Leu GiLy Gly Lya PrD Ala He val 
SO SS 60 

Pro Asp Lya Glu Val Leu Tyr Gin Gin Tyr Asp Glu Met Glu Glu Cys 
S5 70 75 BO 

Ser Gin Ala Ala Pro Tyr He Glu Gin AXa Gin Val He Ala Hia Gin 
85 90 9S 

Phe Lys Gly Lys Val Leu Gly Leu Leu Gin Arg Ala Thr Gin Gin Gin 
100 105 110 

Ala Val He Glu Pro lie Val Thr Thr Asn Trp Gin Lye Leu Glu Ala 
115 120 12S 

Phe Trp His Lys His 
130 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 401 base pairs 

(B) TYPE: nucleic acid 

(C) STRANPEDNESS : single 

(D) TOPOLOGY: linear 

Cii) MOLECULE TYPE". cDNA 
{iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(vii) IMMEDIATE SOURCE; 

(E) CLONE: BR36-20-1S6 

(ix) FEATURE: 

(A) NAME/KEY i CDS 
(B> LOCATION: 3. ,401 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

TC CAA AAT GAA ATC TGC TTG ACA CAC CCC ATC ACA AAA TAG ATC ATG 47 
Gin Asn Glu He Cya Leu Thr His Pro He Thr Lys Tyr He Met 
15 10 15 

GCA TGC ATG TCA OCT GAT CTG GAA GTA ACC ACC AGC ACC TGG GTT TTG 95 
Ala Cys Met Ser Ala Asp Leu Glu Val Thr Thr Ser Thr Trp Val Leu 
20 25 30 

CTT GGA GGG GTC CTC GCG GCC CTA GCG GCC TAG TGC TTG TCA GTC GGT 143 
Leu Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Val Gly 

35 40 45 
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(B) MAP POSITION; positions 248 to 257 of the V4 region of HCV 

type 3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: S€: 

Val Lys Tyr Val Gly Ala Thr Thr Ala Ser 
15 10 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNES5 : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE ; peptide 

(iii) HYPOTHETICAL; NO 

(vi) ORIGINAL SOURCE; 

(C) INDIVIDUAL ISOLATE: BR36 

(viii) POSITION IN GENOME: 

(B) MAP POSITION: Positions 168B to 1707 of HCV type 3 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO; 37: 

Leu Gly Gly Lys Pro Ala He Val Pro Asp Lya Glu Val Leu Tyr Gin 
15 10 15 

Gin Tyr Asp Glu 
20 

(2) INFORMATION FOR SEQ ID NO; 98 ; 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 amino acidp 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE; 

(C) INDIVIDUAL ISOLATE: HD10 

(viii) POSITION IN GENOME; 

(B) MAP POSITION: positions 16B8 to 1707 of HCV type 3 



(xi) SEQUENCE PESCRIPTION: SEQ ID NO: 9B: 

Leu Gly Gly Lye Pro Ala Leu Val Pro Asp Lys Glu val Leu Tyr Gin 
15 10 IS 

Gin Tyr Asp Glu 
20 
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(2) INFORMATION FOR SEQ ID NO: 99: 

<i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 



(viii) POSITION IN GENOME: 

<B) MAP POSITION: positions 1712 to 1731 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

Ser Gin Ala Ala Pro Tyr lie Glu Gin Ala Gla Val lie Ala His Gin 
15 1Q 15 

Phe Lys Glu Lya 
20 

(2) INFORMATION FOR SEQ ID NO: 100; 

(i) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 20 amino acida 

(B) TYPE: amino acid 

(C) STRANDEDNESS; single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE; peptide 

(ili) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: BR36 

(viii) POSITION IN GENOME: 

(B) MAP POSITION: positions 1724 to 1743 of HCV type 3 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 100; 

Tie Ala His Gin Phe Lys Glu LyB Val Leu Gly Leu Leu Gin Arg Ala 
IS 10 15 

Thr Gin Gin Gin 
20 

(2 J INFORMATION FOR SEQ ID NO: 101; 

U) SEQUENC E CH ARACTERISTICS: 

(A) LENGTH: 20 amino acida 

(B) TYPE: amino acid- 

(C) STRANDEDNESS : a ingle 

(D) TOPOLOGY: linear 

(ii) MOLECUIJE TYPE: peptide 
Uii) HYPOTHETICAL: NO 
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